Cell and tumor classification using gene expression data: construction of forests.

نویسندگان

  • Heping Zhang
  • Chang-Yung Yu
  • Burton Singer
چکیده

The advent of gene chips has led to a promising technology for cell, tumor, and cancer classification. We exploit and expand the methodology of recursive partitioning trees for tumor and cell classification from microarray gene expression data. To improve classification and prediction accuracy, we introduce a deterministic procedure to form forests of classification trees and compare their performance with extant alternatives. When two published and commonly used data sets are used, we find that the deterministic forests perform similarly to the random forests in terms of the error rate obtained from the leave-one-out procedure, and all of the forests are far better than the single trees. In addition, we provide graphical presentations to facilitate interpretation of complex forests and compare our findings with the current biological literature. In addition to numerical improvement, the main advantage of deterministic forests is reproducibility and scientific interpretability of all steps in tree construction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Down-regulation of HSP40 gene family following OCT4B1 suppression in human tumor cell lines

Objective(s): The OCT4B1, as one of OCT4 variants, is expressed in cancer cell lines and tissues more than other variants and plays an important role in apoptosis and stress (heat shock protein) pathways. The present study was designed to determine the effects of OCT4B1 silencing on expressional profile of HSP40 gene family expression in three different human tumor cell lines. Materials and Met...

متن کامل

Mapping of TP53 protein network using cytoscape software

TP53 acts as a tumor suppressor in cancer. It induces cell cycle arrest or apoptosis in response to cellular stress and damage. p53 gene alteration could cause uncontrolled cell proliferation.In the present study, we used TP53 gene as the seed in the construction of a protein-protein functional association network to identify genes that might involve in tumorgenesis process with TP53. TP53 prot...

متن کامل

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 100 7  شماره 

صفحات  -

تاریخ انتشار 2003